Statistical analysis for exploring environmental Citizen Science practices and scientists’ attitudes at ILTER

In the spirit of open science, open data and reproducible science (Commission, Research, and Innovation 2021) we share the dataset (Bergami et al. 2022) and the statistical analysis used in the papers L’Astorina et al., 2023 DOI and Bergami et al., 2023 DOI for exploring environmental Citizen Science practices and scientists’ attitudes at ILTER, starting from the results of a global survey.

true , true
October 10, 2022

The dataset

The exact wording of the questions and the relative possible answers are listed in the following table.

Show code
## import dataset
dataset <- readxl::read_excel("ILTER_PublicEngagement_forPapers.xlsx")
dataset$age <- as.numeric(format(Sys.Date(), "%Y")) - as.numeric(dataset$Q33)
rmarkdown::paged_table(dataset, options = list(rows.print = 15))

The questions and the options

Show code
questions <- readxl::read_excel("wording.xlsx")
DT::datatable(
  questions,
  rownames= FALSE,
  extensions = c('FixedColumns', "FixedHeader"),
  options = list(
    dom = 't',
    ordering = F,
    scrollY = '600px',
    paging = FALSE,
    scrollX = TRUE,
    fixedHeader=TRUE,
    fixedColumns = list(leftColumns = 1)
  )
)

If you want reproduce the analysis performed, please visit the GitHub repo download the repository compressed file or clone it. But please …

… remember to cite this document (Oggioni and Bergami 2022) and the dataset (Bergami et al. 2022) if you want to use them for other publications or analysis.

General Information

The link to the survey was sent to all the ILTER site managers through the ILTER secretariat contact list (850 email recipients). The questionnaire remained open from the end of February to mid-September 2020 with two reminders sent within this period. In total, we received 163 responses; based on an estimated 850 participating scientists, our response rate is 17%.

Pools of the papers

First part of the survey, used in L’Astorina et al., 2023 DOI

Number of answers with a completeness >= 75 %:

Show code
poolP1 <- dataset %>%
  dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
  dplyr::count()

165

Second part of the survey, used in Bergami et al., 2023 DOI

Number of answers with a completeness >= 75 % of the columns from Q10 to Q30:

Show code
poolP2 <- dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  # dplyr::filter(Finished == 'True') %>% # = 75
  dplyr::count()

77

Other general info of the dataset

The number of persons who accessed the survey (not necessarily finished it):

Show code
totalAnswers <- dataset %>% 
  dplyr::filter(Finished == 'True' | Finished == 'False') %>% dplyr::count()

296

The response rate is:

Show code
respRate <- round((totalAnswers/850)*100, 2)

34.82 %

The number of persons who finished the survey (no information about the completeness):

Show code
completeAnswers <- dataset %>% 
  dplyr::filter(Finished == 'True') %>% dplyr::count()

142

The response rate considering answers with a completeness >= 50 %:

Show code
compl50rate <- round((poolP1/850)*100, 2)

19.41

The number of answer where the reference to the ILTER site, via DEIMS.ID, was NOT indicated:

Show code
noDEIMSAnswers <- dataset %>% 
  dplyr::filter(Q30 == 'NA') %>% dplyr::count()

201

The number of answer where the reference to the ILTER site, via DEIMS.ID, was indicated:

Show code
DEIMSAnswers <- dataset %>% 
  dplyr::filter(Q30 != 'NA') %>% dplyr::count()

95

The number of answers with DEIMS.iD among them:

Show code
poolP2WithDEIMSID <- dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q30 != 'NA') %>% 
  dplyr::count()

52

The number of answers with LTER network information among them:

Show code
poolP2WithLTERNetwork <- dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(!is.na('LTERNetwork')) %>% 
  dplyr::count()

77

The number of participants in CS initiative among respondents:

Show code
participationVSResponsesAll <- dataset %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::count()

96

The number of participants in CS initiative among respondents with a completeness >= 50 %:

Show code
participationVSResponses <- dataset %>% 
  dplyr::filter(as.numeric(Progress) >= 50) %>%
  dplyr::filter(Q10 > 0) %>% 
  dplyr::count()

90

The number of CS initiatives declared among respondents with a completeness >= 50 %:

Show code
csIntiatives <- dataset %>%
  dplyr::filter(as.numeric(Progress) >= 50) %>%
  dplyr::filter(Q10 > 0) %>%
  dplyr::select(Q10) %>%
  dplyr::summarise(Q10 = sum(Q10))

392

Geographic distribution of responses

Show code
## Join with ILTER DEIMS GeoInfo
# Connect and download layers from LTER-Europe's GeoSever
fileName <- tempfile()
download.file("https://data.lter-europe.net/geoserver/deims/wfs?SERVICE=WFS&VERSION=1.0.0&REQUEST=GetFeature&TYPENAME=deims:ilter_all_formal&SRSNAME=EPSG:4326", fileName)
request <- rwfs::GMLFile$new(fileName)
client <- rwfs::WFSCachingClient$new(request)
ilter_all_formal <- client$getLayer("ilter_all_formal")
## Reading layer `ilter_all_formal' from data source 
##   `/private/var/folders/p1/110rx8q101z0wn0bwh4njrcw0000gn/T/RtmpyTrOO1/file1843219fc0842' 
##   using driver `GML'
## Simple feature collection with 755 features and 6 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -156.5648 ymin: -78 xmax: 175.085 ymax: 79
## CRS:           NA
Show code
sitesOnSurvey <- ilter_all_formal[ilter_all_formal$deimsid %in% dataset$Q30, ]

htmltools::div(
    style = htmltools::css(width="100%", height='100%'),
    leaflet::leaflet(sitesOnSurvey) %>%
    leaflet::addTiles() %>%
    # addMouseCoordinates() %>%
    # leaflet::setView(lng = , lat = 23.16001, zoom = 1) %>%
    leaflet::addMarkers(
      clusterOptions = leaflet::markerClusterOptions(),
      popup = paste0(
              # "Name: <b>", sitesOnSurvey$name, "</b><br/>",
              "DEIMS.ID: <b><a target = 'blank' href = '", sitesOnSurvey$deimsid, "'>", sitesOnSurvey$deimsid, "</a></b><br/>"
            ),
       group = "Sites"
    ) %>%
    leaflet::addLayersControl(position = 'bottomright',
                              overlayGroups = c(
                                "Sites"#,
                                # "Biome"
                              ),
                              options = leaflet::layersControlOptions(collapsed = FALSE)
    )
)

Research questions

Respondents role

Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
  dplyr::select(Q31) %>%
  dplyr::mutate(Q31 = factor(Q31) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q31 = "Role in the site"
    )
  )
Characteristic N = 1651
Role in the site
Collaborator 21 (13%)
Data manager 9 (5.5%)
National Network coordinator 11 (6.7%)
Other 26 (16%)
Site manager 62 (38%)
(Missing) 36 (22%)
1 n (%)

Respondents career level

Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
  dplyr::select(Q32) %>%
  dplyr::mutate(Q32 = factor(Q32) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q32 = "Career level"
    )
  )
Characteristic N = 1651
Career level
Graduate student 2 (1.2%)
Junior (for example, post-doc, assistant professor, entry-level researcher) 17 (10%)
Mid-career (for example, associated professor, mid-level manager) 37 (22%)
Other 3 (1.8%)
Retired (including emeritus) 3 (1.8%)
Senior (for example, professor, senior manager, administrator) 72 (44%)
(Missing) 31 (19%)
1 n (%)
Show code
trial %>%      
  dplyr::select(response) %>%
  # making the NA value explicit level of factor with `forcats::fct_explicit_na()`
  dplyr::mutate(response = factor(response) %>%forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary()
Characteristic N = 2001
response
0 132 (66%)
1 61 (30%)
(Missing) 7 (3.5%)
1 n (%)

Respondents age distribution

Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
  dplyr::select(c(age, Q10, Q31:Q36)) %>%
  dplyr::mutate(decade = floor(age/10)*10) %>%
  dplyr::select(decade) %>%
  dplyr::mutate(decade = factor(decade) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      decade = "Decade of birth"
    )
  )
Characteristic N = 1651
Decade of birth
20 1 (0.6%)
30 8 (4.8%)
40 21 (13%)
50 41 (25%)
60 27 (16%)
70 5 (3.0%)
(Missing) 62 (38%)
1 n (%)

Response rate by regions

Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
  dplyr::select(Regions) %>%
  dplyr::mutate(Regions = factor(Regions) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary()
Characteristic N = 1651
Regions
Africa 5 (3.0%)
Americas 6 (3.6%)
East-Asia-Pacific (EAP) 11 (6.7%)
LTER Europe 96 (58%)
US LTER 16 (9.7%)
(Missing) 31 (19%)
1 n (%)

Willingness for public engagement

Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
  dplyr::select(Q34_1:Q34_6) %>%
  tidyr::gather(questions, levelOfWillingness) %>%
  dplyr::mutate(questions = ifelse(questions == "Q34_1", "Collaborations with the public on scientific research (i.e., Citizen Science)", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q34_2", "Face-to-face science discussions and activities with the public", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q34_3", "Online science discussions and activities with the public", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q34_4", "Interviews with journalists or other media professionals about science", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q34_5", "Direct interactions with government policy makers about science", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q34_6", "Any form of public engagement with science involving children or young adults (18 years or younger)", questions)) %>%
  dplyr::mutate(
    levelOfWillingness = factor(
      levelOfWillingness,
      levels = c("Very\r\nunwilling", "Slightly \r\nunwilling", "Neither unwilling\r\nor willing", "Slightly \r\nwilling", "Very\r\nwilling")
    ) %>% forcats::fct_explicit_na()
  ) %>%
  gtsummary::tbl_summary(
    by = c(questions),
    label = list(
      levelOfWillingness = "Level of willingness"
    )
  ) %>%
  gtsummary::modify_header(label = "**Questions**")
Questions Any form of public engagement with science involving children or young adults (18 years or younger), N = 1651 Collaborations with the public on scientific research (i.e., Citizen Science), N = 1651 Direct interactions with government policy makers about science, N = 1651 Face-to-face science discussions and activities with the public, N = 1651 Interviews with journalists or other media professionals about science, N = 1651 Online science discussions and activities with the public, N = 1651
Level of willingness
Very unwilling 8 (4.8%) 11 (6.7%) 9 (5.5%) 6 (3.6%) 11 (6.7%) 9 (5.5%)
Slightly unwilling 14 (8.5%) 10 (6.1%) 6 (3.6%) 8 (4.8%) 17 (10%) 26 (16%)
Neither unwilling or willing 13 (7.9%) 16 (9.7%) 13 (7.9%) 15 (9.1%) 18 (11%) 23 (14%)
Slightly willing 55 (33%) 34 (21%) 55 (33%) 46 (28%) 39 (24%) 72 (44%)
Very willing 75 (45%) 94 (57%) 82 (50%) 88 (53%) 80 (48%) 34 (21%)
(Missing) 0 (0%) 0 (0%) 0 (0%) 2 (1.2%) 0 (0%) 1 (0.6%)
1 n (%)

Importance of different reasons for participation in citizen science

Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
  dplyr::select(Q6_1:Q6_8) %>%
  tidyr::gather(questions, reasons) %>%
  dplyr::mutate(questions = ifelse(questions == "Q6_5", "Educate the public on environmental issues", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q6_7", "Build relationships between scientists and the public who live and work near LTER Sites or LTSER Platforms", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q6_8", "Have greater influence on policy by collaborating with the public on scientific research", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q6_4", "Educate the public on how science research is conducted", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q6_6", "Bring in perspectives and ideas from the public that can inform scientific research", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q6_1", "Get help from the public by having them collect or classify data", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q6_3", "Make a grant proposal more competitive and appealing to funders by including citizen science", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q6_2", "Get help from the public in ways that are not limited to data collection and classification", questions)) %>%
  dplyr::mutate(
    reasons = factor(
      reasons,
      levels = c("Very low\r\nimportance", "Little\r\nimportance", "Moderate\r\nimportance", "High\r\nimportance", "Very high\r\nimportance")
    ) %>% forcats::fct_explicit_na()
  ) %>%
  gtsummary::tbl_summary(
    by = c(questions),
    label = list(
      reasons = "Reasons"
    )
  ) %>%
  gtsummary::modify_header(label = "**Questions**")
Questions Bring in perspectives and ideas from the public that can inform scientific research, N = 1651 Build relationships between scientists and the public who live and work near LTER Sites or LTSER Platforms, N = 1651 Educate the public on environmental issues, N = 1651 Educate the public on how science research is conducted, N = 1651 Get help from the public by having them collect or classify data, N = 1651 Get help from the public in ways that are not limited to data collection and classification, N = 1651 Have greater influence on policy by collaborating with the public on scientific research, N = 1651 Make a grant proposal more competitive and appealing to funders by including citizen science, N = 1651
Reasons
Very low importance 1 (0.6%) 4 (2.4%) 1 (0.6%) 2 (1.2%) 6 (3.6%) 7 (4.2%) 0 (0%) 18 (11%)
Little importance 15 (9.1%) 6 (3.6%) 1 (0.6%) 13 (7.9%) 31 (19%) 28 (17%) 13 (7.9%) 24 (15%)
Moderate importance 49 (30%) 27 (16%) 19 (12%) 39 (24%) 65 (39%) 81 (49%) 40 (24%) 66 (40%)
High importance 67 (41%) 67 (41%) 56 (34%) 71 (43%) 44 (27%) 35 (21%) 62 (38%) 47 (28%)
Very high importance 33 (20%) 61 (37%) 88 (53%) 39 (24%) 19 (12%) 12 (7.3%) 49 (30%) 9 (5.5%)
(Missing) 0 (0%) 0 (0%) 0 (0%) 1 (0.6%) 0 (0%) 2 (1.2%) 1 (0.6%) 1 (0.6%)
1 n (%)

Possible barriers to citizen science

Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
  dplyr::select(Q8_1:Q8_8) %>%
  tidyr::gather(questions, barriers) %>%
  dplyr::mutate(questions = ifelse(questions == "Q8_7", "Scientists do not have any or enough support to start and run a citizen science project", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q8_5", "It is difficult to create long-term stable relationships with the public, which are necessary to conduct scientific research", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q8_8", "Scientists do not get credit or acknowledgement for their work in citizen science", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q8_3", "It is too difficult or time-consuming to validate data collected or classified by the public", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q8_1", "The public does not have the necessary knowledge or skills to contribute to scientific research", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q8_2", "It is too difficult or time-consuming to teach the public the necessary knowledge or skills to contribute to scientific research", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q8_6", "It is not possible to acknowledge citizen science volunteers’ contribution in grants, presentations, and publications", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q8_4", "The public is not interested in helping with science research", questions)) %>%
  dplyr::mutate(
    barriers = factor(
      barriers,
      levels = c("Strongly\r\ndisagree", "Disagree", "Neither agree\r\nnor disagree", "Agree", "Strongly\r\nagree")
    ) %>% forcats::fct_explicit_na()
  ) %>%
  gtsummary::tbl_summary(
    by = c(questions),
    label = list(
      barriers = "Barriers"
    )
  ) %>%
  gtsummary::modify_header(label = "**Questions**")
Questions It is difficult to create long-term stable relationships with the public, which are necessary to conduct scientific research, N = 1651 It is not possible to acknowledge citizen science volunteers’ contribution in grants, presentations, and publications, N = 1651 It is too difficult or time-consuming to teach the public the necessary knowledge or skills to contribute to scientific research, N = 1651 It is too difficult or time-consuming to validate data collected or classified by the public, N = 1651 Scientists do not get credit or acknowledgement for their work in citizen science, N = 1651 Scientists do not have any or enough support to start and run a citizen science project, N = 1651 The public does not have the necessary knowledge or skills to contribute to scientific research, N = 1651 The public is not interested in helping with science research, N = 1651
Barriers
Strongly disagree 7 (4.2%) 21 (13%) 13 (7.9%) 7 (4.2%) 10 (6.1%) 5 (3.0%) 15 (9.1%) 32 (19%)
Disagree 30 (18%) 60 (36%) 49 (30%) 37 (22%) 34 (21%) 22 (13%) 39 (24%) 81 (49%)
Neither agree nor disagree 38 (23%) 56 (34%) 49 (30%) 51 (31%) 51 (31%) 39 (24%) 53 (32%) 39 (24%)
Agree 68 (41%) 24 (15%) 38 (23%) 57 (35%) 42 (25%) 76 (46%) 47 (28%) 7 (4.2%)
Strongly agree 21 (13%) 4 (2.4%) 16 (9.7%) 12 (7.3%) 27 (16%) 21 (13%) 11 (6.7%) 5 (3.0%)
(Missing) 1 (0.6%) 0 (0%) 0 (0%) 1 (0.6%) 1 (0.6%) 2 (1.2%) 0 (0%) 1 (0.6%)
1 n (%)

Impacts of Citizen Science on scientists

Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
  dplyr::select(Q29_1:Q29_9) %>%
  tidyr::gather(questions, impacts) %>%
  dplyr::mutate(questions = ifelse(questions == "Q29_2", "My involvement in citizen science has given me a better understanding of what the public thinks about scientists and the work they do.", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q29_1", "My involvement in citizen science has given me insight into the concerns that the public has about science.", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q29_4", "My involvement in citizen science has helped me improve how I communicate about my work with stakeholders.", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q29_9", "My involvement in citizen science has helped me place my research in a broader context.", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q29_3", "My involvement in citizen science has given me an opportunity to learn from the public in ways that are relevant to the work that I do.", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q29_6", "My involvement in citizen science has helped me improve how I teach and mentor students and staff.", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q29_5", "My involvement in citizen science has helped me improve how I communicate about my work with scientists outside my field.", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q29_7", "My involvement in citizen science has influenced how I ask research questions.", questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q29_8", "My involvement in citizen science has influenced how I design studies, collect data, or analyze data.", questions)) %>%
  dplyr::mutate(
    impacts = factor(
      impacts,
      levels = c("Strongly\r\ndisagree", "Disagree", "Neither agree\r\nnor disagree", "Agree", "Strongly\r\nagree")
    ) %>% forcats::fct_explicit_na()
  ) %>%
  gtsummary::tbl_summary(
    by = c(questions),
    label = list(
      impacts = "Impacts"
    )
  ) %>%
  gtsummary::modify_header(label = "**Questions**")
Questions My involvement in citizen science has given me a better understanding of what the public thinks about scientists and the work they do., N = 1651 My involvement in citizen science has given me an opportunity to learn from the public in ways that are relevant to the work that I do., N = 1651 My involvement in citizen science has given me insight into the concerns that the public has about science., N = 1651 My involvement in citizen science has helped me improve how I communicate about my work with scientists outside my field., N = 1651 My involvement in citizen science has helped me improve how I communicate about my work with stakeholders., N = 1651 My involvement in citizen science has helped me improve how I teach and mentor students and staff., N = 1651 My involvement in citizen science has helped me place my research in a broader context., N = 1651 My involvement in citizen science has influenced how I ask research questions., N = 1651 My involvement in citizen science has influenced how I design studies, collect data, or analyze data., N = 1651
Impacts
Strongly disagree 1 (0.6%) 3 (1.8%) 0 (0%) 3 (1.8%) 1 (0.6%) 5 (3.0%) 1 (0.6%) 6 (3.6%) 6 (3.6%)
Disagree 4 (2.4%) 5 (3.0%) 4 (2.4%) 9 (5.5%) 7 (4.2%) 3 (1.8%) 9 (5.5%) 16 (9.7%) 17 (10%)
Neither agree nor disagree 9 (5.5%) 17 (10%) 15 (9.1%) 25 (15%) 12 (7.3%) 22 (13%) 14 (8.5%) 20 (12%) 26 (16%)
Agree 54 (33%) 34 (21%) 46 (28%) 32 (19%) 36 (22%) 35 (21%) 43 (26%) 32 (19%) 21 (13%)
Strongly agree 13 (7.9%) 22 (13%) 16 (9.7%) 12 (7.3%) 24 (15%) 13 (7.9%) 13 (7.9%) 7 (4.2%) 11 (6.7%)
(Missing) 84 (51%) 84 (51%) 84 (51%) 84 (51%) 85 (52%) 87 (53%) 85 (52%) 84 (51%) 84 (51%)
1 n (%)

How many citizen science initiatives at your LTER Site or LTSER Platform have you been involved in?

The percentage of participants in CS initiative among respondents with a completeness >= 50 % and the participants with a completeness >= 50 % of answers is:

Show code
partiRate <- round((participationVSResponses/poolP1)*100, 1)

54.5 %

The average of CS projects among the participants with a completeness >= 75 % in the second part of survey is:

Show code
average <- dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
  dplyr::select(Q10) %>%
  dplyr::summarise(average = round(mean(Q10, na.rm=TRUE), 1))

4.6

Does participation in CS differ by gender of ILTER scientists?

Show code
participationCSDifference <- dataset %>%
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>%
  dplyr::select(c(Q10, Q31:Q36)) #%>% View()
participationCSDifference$age <- as.numeric(format(Sys.Date(), "%Y")) - participationCSDifference$Q33
# Q35 Gender
participationCSDifference %>%
  dplyr::group_by(Q35) %>% 
  # dplyr::summarise(totalCSInitiative = sum(Q10)) %>% 
  dplyr::count(Q35) %>% 
  dplyr::filter(n > 1) %>%
  ggplot2::ggplot(ggplot2::aes(x = Q35, y = n)) +
  ggplot2::geom_bar(stat = "identity", fill = "orange") +
  ggplot2::xlab("Gender") + ggplot2::ylab("Participants in CS initiatives") +
  ggplot2::geom_text(ggplot2::aes(label = n), vjust = 1.6, color = "white", size = 3.5) +
  ggplot2::theme_classic()
Show code
participationCSDifference %>%
  dplyr::select(Q10, Q35) %>%
  gtsummary::tbl_summary(
    label = list(
      Q10 = "CS projects declared by participant (Q10)",
      Q35 = "Gender participant (Q35)"
    )
) %>%
  gtsummary::modify_header(label = "**Questions**")
Questions N = 761
CS projects declared by participant (Q10) 3.0 (2.0, 4.0)
Gender participant (Q35)
Female 29 (39%)
Male 45 (61%)
Unknown 2
1 Median (IQR); n (%)
Show code
participationCSDifference %>%
  dplyr::select(Q10, Q35) %>%
  gtsummary::tbl_summary(
    by = Q35, # split table by group
    missing = "no", # don't list missing data separately
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    label = list(Q10 = "CS projects declared (Q10)")
  ) %>%
  gtsummary::add_n() %>% # add column with total number of non-missing observations
  gtsummary::add_p() %>% # test for a difference between groups
  gtsummary::modify_header(label = "**Variable**") %>% # update the column header
  gtsummary::bold_labels()
Variable N Female, N = 291 Male, N = 451 p-value2
CS projects declared (Q10) 74 5.5 (10.1) 4.1 (4.7) 0.8
1 Mean (SD)
2 Wilcoxon rank sum test

Spatial scale of the CS initiatives

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q13) %>%
  dplyr::mutate(Q13 = strsplit(as.character(Q13), ",")) %>% 
  tidyr::unnest(Q13) %>%
  gtsummary::tbl_summary(
    label = list(
      Q13 = "GSpatial scale CS initiative"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 821
GSpatial scale CS initiative
International (at multiple national networks) 10 (12%)
Local (at your site only) 35 (43%)
National (at national network level) 14 (17%)
Regional (at several sites in the same region) 23 (28%)
1 n (%)
Show code
# dataset %>% 
#   dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
#   dplyr::filter(Q10 > 0) %>% 
#   dplyr::select(c(Q10, Q13)) %>% # 73
#   dplyr::count(Q13) %>% 
#   dplyr::filter(n > 1) %>% 
#   ggplot2::ggplot(ggplot2::aes(x = Q13, y = n)) +
#   ggplot2::geom_bar(stat = "identity", fill = "green4") +
#   ggplot2::xlab("") + ggplot2::ylab("Number of projects") +
#   ggplot2::scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 10)) +
#   ggplot2::geom_text(ggplot2::aes(label = n), vjust = 1.6, color = "white", size = 3.5) +
#   ggplot2::theme_classic()

sjPlot::set_theme(
  base = ggplot2::theme_light(),
  axis.tickslen = 0, # hides tick marks
  axis.title.size = .9,
  axis.textsize = .9,
  geom.label.size = 3.5,
  axis.title.y.vjust = 5
)
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(c(Q10, Q13)) %>%
  dplyr::group_by(Q13) %>%
  dplyr::mutate(Q13 = strsplit(as.character(Q13), ",")) %>% 
  tidyr::unnest(Q13) %>%
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::filter(freq > 1) %>%
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q13,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects")
  )

Temporal scale of the CS initiatives

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q56) %>%
  gtsummary::tbl_summary(
    label = list(
      Q56 = "Projects duration (years)"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 761
Projects duration (years) 4.0 (2.0, 7.0)
Unknown 1
1 Median (IQR)
Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q12) %>%
  dplyr::mutate(Q12 = factor(Q12) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    type = list(Q12 ~ "categorical"),
    label = list(
      Q12 = "Projects active/concluded"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 761
Projects active/concluded
No 23 (30%)
Yes 52 (68%)
(Missing) 1 (1.3%)
1 n (%)
Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(c(Q12, Q56)) %>%
  dplyr::group_by(Q12, Q56) %>%
  dplyr::summarise(freq = dplyr::n()) %>% 
  dplyr::filter(!is.na(Q12)) %>% 
  ggplot2::ggplot(ggplot2::aes(x = Q12, y = Q56)) +
  ggplot2::geom_point(ggplot2::aes(size = freq), colour = "#1F78B4") +
  ggplot2::xlab("Is the project still active?") + 
  ggplot2::ylab("Number of projects") +
  ggplot2::labs(size = "Project duration (years)") +
  ggplot2::scale_size_continuous(
    breaks = c(2, 4, 6, 8),
    labels = c('<=2', '4', '6', '=>8')
  ) +
  ggplot2::theme_bw()

Research focus

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q14) %>%
  gtsummary::tbl_summary(
    label = list(
      Q14 = "Research focus"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 761
Research focus
Biology 20 (26%)
Environmental science 38 (50%)
Global change 6 (7.9%)
Hydrology 5 (6.6%)
Limnology 2 (2.6%)
Management 3 (3.9%)
Oceanography 2 (2.6%)
1 n (%)
Show code
sjPlot::set_theme(
  base = ggplot2::theme_light(),
  axis.tickslen = 0, # hides tick marks
  axis.title.size = .9,
  axis.textsize = .9,
  geom.label.size = 3.5,
  axis.title.y.vjust = 5
)
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(c(Q10, Q14)) %>%
  dplyr::group_by(Q14) %>% 
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::filter(freq > 1) %>%
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q14,
    show.axis.values = FALSE
  )

Research question(s)

Show code
# Thanks to this guide http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know
strings_Q15 <- dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q15) %>%
  dplyr::pull(Q15)

docs <- tm::Corpus(tm::VectorSource(strings_Q15))

# Cleaning the text
toSpace <- tm::content_transformer(function (x , pattern) gsub(pattern, " ", x))
docs <- tm::tm_map(docs, toSpace, "/")
docs <- tm::tm_map(docs, toSpace, "@")
docs <- tm::tm_map(docs, toSpace, "\\|")
# Convert the text to lower case
docs <- tm::tm_map(docs, tm::content_transformer(tolower))
# Remove numbers
docs <- tm::tm_map(docs, tm::removeNumbers)
# Remove english common stopwords
docs <- tm::tm_map(docs, tm::removeWords, tm::stopwords("english"))
# Remove your own stop word
# specify your stopwords as a character vector
# docs <- tm::tm_map(docs, tm::removeWords, c("blabla1", "blabla2")) 
# Remove punctuations
docs <- tm::tm_map(docs, tm::removePunctuation)
# Eliminate extra white spaces
docs <- tm::tm_map(docs, tm::stripWhitespace)
# Text stemming
# docs <- tm::tm_map(docs, tm::stemDocument)

# Build a term-document matrix
dtm <- tm::TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)

# Generate the Word cloud
set.seed(1234)
wordcloud::wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words = 200, random.order = FALSE, rot.per = 0.05, 
          colors = RColorBrewer::brewer.pal(8, "Dark2"))

Participants number/year

Show code
# dataset %>% 
#   dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
#   dplyr::filter(Q10 > 0) %>% 
#   dplyr::select(Q10, Q17, Q18:Q18_8_TEXT, Q19:Q19_4_TEXT) %>%
#   dplyr::group_by(Q17) %>%
#   dplyr::summarise(numProj = n()) %>%
#   dplyr::mutate(Q17 = forcats::fct_relevel(Q17, "Fewer than 25", "25-50", "51-100", "101-500", "More than 500", "NA")) %>% 
#   ggplot2::ggplot(ggplot2::aes(x = Q17, y = numProj)) +
#   ggplot2::xlab("n participants/year") + ggplot2::ylab("Number of projects") +
#   ggplot2::geom_bar(stat = "identity", fill = "blue4") +
#   ggplot2::geom_text(ggplot2::aes(label = numProj), vjust = 1.6, color = "white", size = 3.5) +
#   ggplot2::theme_classic()

sjPlot::set_theme(
  base = ggplot2::theme_light(),
  axis.tickslen = 0, # hides tick marks
  axis.title.size = .9,
  axis.textsize = .9,
  geom.label.size = 3.5,
  axis.title.y.vjust = 5
)
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q10, Q17) %>%
  dplyr::mutate(Q17 = forcats::fct_relevel(Q17, "Fewer than 25", "25-50", "51-100", "101-500", "More than 500", "NA")) %>%  
  sjPlot::plot_frq(
    Q17,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects")
  )
Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q17) %>%
  dplyr::mutate(Q17 = factor(Q17) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q17 = "n participants/year"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 761
n participants/year
101-500 7 (9.2%)
25-50 12 (16%)
51-100 10 (13%)
Fewer than 25 39 (51%)
More than 500 8 (11%)
1 n (%)

Type of participants

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q18) %>%
  dplyr::mutate(Q18 = strsplit(as.character(Q18), ",")) %>% 
  tidyr::unnest(Q18) %>%
  gtsummary::tbl_summary(
    label = list(
      Q18 = "Group types"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 1921
Group types
Adults who are in an organized group (for example birding club) 32 (17%)
Adults who are not in an organized group 49 (26%)
Children and young adults participating as part of a school program (18 years or younger) 23 (12%)
Children and young adults participating through an "out-of-school" program (18 years or younger) 12 (6.2%)
Families with adults and children/young adults 15 (7.8%)
Other 19 (9.9%)
Senior (over 65 years) 21 (11%)
Undergraduate students 21 (11%)
1 n (%)
Show code
sjPlot::set_theme(
  base = ggplot2::theme_light(),
  axis.tickslen = 0, # hides tick marks
  axis.title.size = .9,
  axis.textsize = .9,
  geom.label.size = 3.5,
  axis.title.y.vjust = 5
)
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q18) %>%
    dplyr::mutate(Q18 = strsplit(as.character(Q18), ",")) %>% 
  tidyr::unnest(Q18) %>%
  dplyr::group_by(Q18) %>% 
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::filter(freq > 1) %>%
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q18,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects"),
    geom.colors = "red4"
  )

Undeserved communities

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q19) %>%
  dplyr::mutate(Q19 = strsplit(as.character(Q19), ",")) %>% 
  tidyr::unnest(Q19) %>%
  dplyr::mutate(Q19 = factor(Q19) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q19 = "Group types"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 971
Group types
Other 14 (14%)
Volunteers who are a minority group(s) in your region 10 (10%)
Volunteers who have limited financial resources 19 (20%)
Volunteers who live in rural areas 19 (20%)
We do not target underserved community members. 28 (29%)
(Missing) 7 (7.2%)
1 n (%)
Show code
sjPlot::set_theme(
  base = ggplot2::theme_light(),
  axis.tickslen = 0, # hides tick marks
  axis.title.size = .9,
  axis.textsize = .9,
  geom.label.size = 3.5,
  axis.title.y.vjust = 5
)
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q19) %>%
  dplyr::mutate(Q19 = strsplit(as.character(Q19), ",")) %>% 
  tidyr::unnest(Q19) %>%  
  dplyr::group_by(Q19) %>% 
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::filter(freq > 1) %>%
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q19,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects"),
    geom.colors = "red4"
  )

Partecipation frequency

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q20) %>%
  dplyr::mutate(Q20 = factor(Q20) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q20 = "Participation frequency"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 761
Participation frequency
Four to six times 10 (13%)
More than six times 17 (22%)
Once 12 (16%)
Two to three times 25 (33%)
(Missing) 12 (16%)
1 n (%)
Show code
sjPlot::set_theme(
  base = ggplot2::theme_light(),
  axis.tickslen = 0, # hides tick marks
  axis.title.size = .9,
  axis.textsize = .9,
  geom.label.size = 3.5,
  axis.title.y.vjust = 5
)
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q20) %>%
  dplyr::mutate(Q20 = forcats::fct_relevel(Q20, 
                                    "Once", "Two to three times", "Four to six times", "More than six times")) %>%
  dplyr::group_by(Q20) %>% 
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::filter(freq > 1) %>%
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q20,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects"),
    geom.colors = "red4"
  )

Type of involvement of the participants in the CS initiatives

Show code
matrix7e <- dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q21_1:Q21_13) %>%
  tidyr::gather(questions, typeOfInvol) %>% dplyr::group_by(questions, typeOfInvol) %>% dplyr::count() %>% dplyr::ungroup() %>% tidyr::spread(questions, n) %>% 
  t() %>% 
  data.frame(row.names(.), ., row.names = NULL) %>% 
  `colnames<-`(c('Activity in CS', 'High involvement', 'Moderate involvement', 'Not at all involved', 'Very high involvement', 'Very little involvement', 'NA')) %>% 
  .[-1,-1] %>% .[,-6] %>% 
  as.matrix() %>% 
  `rownames<-`(c(
    'Help define research questions',
    'Help interpret data and draw conclusions',
    'Help disseminate conclusions',
    'Help translate the results into action',
    'Help discuss results and ask new questions',
    'Help gather information and resources for research',
    'Help develop hypotheses',
    'Help design data collection methodologies',
    'Help collect samples or record data',
    'Help classify data',
    'Help process samples',
    'Help validate data', 
    'Help analyze data'
  )) %>% 
  reshape2::melt() 

matrix7e$Var2 <- factor(matrix7e$Var2, levels = c("Not at all involved", "Very little involvement", "Moderate involvement", "High involvement", "Very high involvement"))
matrix7e <- matrix7e[matrix7e$value!=0,]
matrix7e <- matrix7e[!is.na(matrix7e$value),]

ggplot2::ggplot(matrix7e, aes(x = Var2, y = Var1)) + 
  ggplot2::geom_raster(ggplot2::aes(fill = as.numeric(value))) + 
  ggplot2::scale_fill_gradient(low = "grey90", high = "red4", na.value = "grey10", guide = "colourbar") +
  ggplot2::labs(x = "Degree of Involvement", y = "Type of Involvement") +
  ggplot2::scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 10)) +
  ggplot2::scale_y_discrete(labels = function(x) stringr::str_wrap(x, width = 30)) +
  ggplot2::labs(fill = "n of answers") +
  ggplot2::theme_classic() + ggplot2::theme(axis.text.x = element_text(size = 8, angle = 0, vjust = 0.3),
                          axis.text.y = element_text(size = 8),
                          plot.title = element_text(size = 11))
Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q21_1:Q21_13) %>% 
  tidyr::gather(questions, involvements) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_1", 'Help define research questions', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_10", 'Help interpret data and draw conclusions', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_11", 'Help disseminate conclusions', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_12", 'Help translate the results into action', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_13", 'Help discuss results and ask new questions', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_2", 'Help gather information and resources for research', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_3", 'Help develop hypotheses', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_4", 'Help design data collection methodologies', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_5", 'Help collect samples or record data', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_6", 'Help classify data', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_7", 'Help process samples', questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_8", 'Help validate data',  questions)) %>%
  dplyr::mutate(questions = ifelse(questions == "Q21_9", 'Help analyze data', questions)) %>%
  dplyr::mutate(
    involvements = factor(
      involvements,
      levels = c("Not at all\r\ninvolved", "Very little\r\ninvolvement", "Moderate\r\ninvolvement", "High\r\ninvolvement", "Very high\r\ninvolvement")
    ) %>% forcats::fct_explicit_na()
  ) %>%
  gtsummary::tbl_summary(
    by = c(questions),
    label = list(
      involvements = "Involvements"
    )
  ) %>%
  gtsummary::modify_header(label = "**Questions**")
Questions Help analyze data, N = 761 Help classify data, N = 761 Help collect samples or record data, N = 761 Help define research questions, N = 761 Help design data collection methodologies, N = 761 Help develop hypotheses, N = 761 Help discuss results and ask new questions, N = 761 Help disseminate conclusions, N = 761 Help gather information and resources for research, N = 761 Help interpret data and draw conclusions, N = 761 Help process samples, N = 761 Help translate the results into action, N = 761 Help validate data, N = 761
Involvements
Not at all involved 44 (58%) 13 (17%) 4 (5.3%) 28 (37%) 24 (32%) 27 (36%) 11 (14%) 19 (25%) 13 (17%) 28 (37%) 35 (46%) 20 (26%) 24 (32%)
Very little involvement 16 (21%) 24 (32%) 6 (7.9%) 16 (21%) 29 (38%) 21 (28%) 18 (24%) 7 (9.2%) 17 (22%) 22 (29%) 17 (22%) 16 (21%) 20 (26%)
Moderate involvement 12 (16%) 18 (24%) 9 (12%) 16 (21%) 15 (20%) 23 (30%) 15 (20%) 16 (21%) 19 (25%) 9 (12%) 14 (18%) 15 (20%) 15 (20%)
High involvement 4 (5.3%) 17 (22%) 29 (38%) 11 (14%) 8 (11%) 3 (3.9%) 17 (22%) 23 (30%) 16 (21%) 15 (20%) 3 (3.9%) 22 (29%) 14 (18%)
Very high involvement 0 (0%) 4 (5.3%) 28 (37%) 4 (5.3%) 0 (0%) 2 (2.6%) 15 (20%) 11 (14%) 10 (13%) 2 (2.6%) 5 (6.6%) 3 (3.9%) 3 (3.9%)
(Missing) 0 (0%) 0 (0%) 0 (0%) 1 (1.3%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (1.3%) 0 (0%) 2 (2.6%) 0 (0%) 0 (0%)
1 n (%)

Training methodologies of the volunteers

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q22) %>%
  dplyr::mutate(Q22 = strsplit(as.character(Q22), ",")) %>% 
  tidyr::unnest(Q22) %>%
  dplyr::mutate(Q22 = factor(Q22) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q22 = "Training methodologies"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 1341
Training methodologies
Mandatory face-to-face short workshop (1 day or less) 10 (7.5%)
Mandatory multi-day training or certification course (more than 1 day) 5 (3.7%)
Mandatory online tutorials 1 (0.7%)
Other 13 (9.7%)
Voluntary face-to-face short workshop (1 day or less) 47 (35%)
Voluntary online tutorials 9 (6.7%)
We do not provide any training or support for volunteers. 4 (3.0%)
Written online or print instructions 44 (33%)
(Missing) 1 (0.7%)
1 n (%)
Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>% 
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q22) %>%
  dplyr::mutate(Q22 = strsplit(as.character(Q22), ",")) %>% 
  tidyr::unnest(Q22) %>%
  dplyr::group_by(Q22) %>% 
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::filter(freq > 1) %>%
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q22,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects"),
    geom.colors = "yellow4"
  )

Data type

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q23) %>%
  dplyr::mutate(Q23 = strsplit(as.character(Q23), ",")) %>% 
  tidyr::unnest(Q23) %>%
  dplyr::mutate(Q23 = factor(Q23) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q23 = "Data type"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 2061
Data type
Boolean 23 (11%)
Geographic coordinates 42 (20%)
Images 40 (19%)
Numeric 53 (26%)
Other 11 (5.3%)
Textual 34 (17%)
We do not have volunteers collect data. 2 (1.0%)
(Missing) 1 (0.5%)
1 n (%)
Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
  dplyr::filter(Q10 > 0) %>%
  dplyr::select(Q10, Q23) %>%
  dplyr::mutate(Q23 = strsplit(as.character(Q23), ",")) %>% 
  tidyr::unnest(Q23) %>%
  dplyr::group_by(Q23) %>%
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::filter(freq > 3) %>%
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q23,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects"),
    geom.colors = "violetred4"
  )

Quality check

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q25) %>%
  dplyr::mutate(Q25 = strsplit(as.character(Q25), ",")) %>% 
  tidyr::unnest(Q25) %>%
  dplyr::mutate(Q25 = factor(Q25) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q25 = "Quality check"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 1571
Quality check
Other 7 (4.5%)
These data are checked by other volunteers 16 (10%)
These data are checked by scientists 63 (40%)
These data are checked using automated filters 10 (6.4%)
These data are compared to data submitted by other volunteers or by scientists 15 (9.6%)
These data are compared to volunteers' statements about their confidence in the quality of their submitted data 9 (5.7%)
These data are confirmed through photos vouchers or samples that volunteers submitted with their data 20 (13%)
These data are cross-checked for consistency with existing literature or other repositories 13 (8.3%)
We do not check (validate) data collected or classified by volunteers. 3 (1.9%)
(Missing) 1 (0.6%)
1 n (%)
Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
  dplyr::filter(Q10 > 0) %>%
  dplyr::select(Q10, Q25) %>%
  dplyr::mutate(Q25 = strsplit(as.character(Q25), ",")) %>% 
  tidyr::unnest(Q25) %>%
  dplyr::group_by(Q25) %>%
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q25,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects"),
    geom.colors = "violetred4"
  )

Ways to share data

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q24) %>%
  dplyr::mutate(Q24 = factor(Q24) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q24 = "Data sharing metodology"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 761
Data sharing metodology
Data are not uploaded to an online system. 26 (34%)
Volunteers can access all data collected or classified by all volunteers and scientists. 30 (39%)
Volunteers can access only the data that they collected or classified. 10 (13%)
(Missing) 10 (13%)
1 n (%)
Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
  dplyr::filter(Q10 > 0) %>%
  dplyr::select(Q10, Q24) %>%
  dplyr::group_by(Q24) %>%
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::filter(freq > 1) %>%
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q24,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects"),
    geom.colors = "chocolate4"
  )

Ways to share findings

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q26) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "At in-person meetings"),
      "At in-person meetings"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "At virtual meetings"),
      "At virtual meetings"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "Through indirect online communications"),
      "Through indirect online communications"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "Through direct online communications"),
      "Through direct online communications"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "Through written newsletters or reports"),
      "Through written newsletters or reports"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "We do not share findings with volunteers."),
      "We do not share findings with volunteers."
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "Other"),
      "Other"
    )
  ) %>%
  dplyr::mutate(Q26 = factor(Q26) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q26 = "Ways to share findings"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 761
Ways to share findings
At in-person meetings 38 (50%)
At virtual meetings 2 (2.6%)
Other 4 (5.3%)
Through direct online communications 14 (18%)
Through indirect online communications 11 (14%)
Through written newsletters or reports 2 (2.6%)
We do not share findings with volunteers. 4 (5.3%)
(Missing) 1 (1.3%)
1 n (%)
Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
  dplyr::filter(Q10 > 0) %>%
  dplyr::select(Q10, Q26) %>%
  dplyr::group_by(Q26) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "At in-person meetings"),
      "At in-person meetings"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "At virtual meetings"),
      "At virtual meetings"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "Through indirect online communications"),
      "Through indirect online communications"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "Through direct online communications"),
      "Through direct online communications"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "Through written newsletters or reports"),
      "Through written newsletters or reports"
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "We do not share findings with volunteers."),
      "We do not share findings with volunteers."
    )
  ) %>%
  dplyr::mutate(
    Q26 = replace(
      Q26, 
      stringr::str_starts(Q26, "Other"),
      "Other"
    )
  ) %>%
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::filter(freq > 1) %>%
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q26,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects"),
    geom.colors = "chocolate4"
  )

Ways to acknowledge participants

Show code
dataset %>% 
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%  
  dplyr::filter(Q10 > 0) %>% 
  dplyr::select(Q27) %>%
  dplyr::mutate(Q27 = strsplit(as.character(Q27), ",")) %>% 
  tidyr::unnest(Q27) %>%
  dplyr::mutate(Q27 = factor(Q27) %>% forcats::fct_explicit_na()) %>%
  gtsummary::tbl_summary(
    label = list(
      Q27 = "Quality check"
    )
  ) %>%
  gtsummary::modify_header(label = "**Responses**")
Responses N = 1061
Quality check
Other 7 (6.6%)
Volunteers are acknowledged in the Acknowledgments section 51 (48%)
Volunteers are listed as co-authors 13 (12%)
Volunteers' contribution is described within the report or journal article (for example the Methods section) 27 (25%)
We do not acknowledge volunteers' contribution. 3 (2.8%)
(Missing) 5 (4.7%)
1 n (%)
Show code
dataset %>%
  dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
  dplyr::filter(Q10 > 0) %>%
  dplyr::select(Q10, Q27) %>%
  dplyr::mutate(Q27 = strsplit(as.character(Q27), ",")) %>% 
  tidyr::unnest(Q27) %>%
  dplyr::group_by(Q27) %>%
  dplyr::mutate(freq = n()) %>%
  dplyr::ungroup() %>% 
  dplyr::select(-freq) %>% 
  sjPlot::plot_frq(
    Q27,
    show.axis.values = FALSE,
    axis.title = c("", "Number of projects"),
    geom.colors = "chocolate4"
  )

Geographical distribution of the ILTER initiatives

by ILTER Environmental characteristics of the site

Show code
# listOfAllSites <- ReLTER::get_ilter_generalinfo()
# saveRDS(listOfAllSites, file = "ilter_sitesData.rds")
# listOfAllSites <- readRDS(file = "ilterSitesData.rds")
# remove the sites without geometry
# listOfAllSites <- listOfAllSites[c(1:1226, 1228:1237, 1239:1240, 1242:1243, 1248:1249), ]
# 
# siteWithDeimsId <- dataset %>% 
#   dplyr::select(Q30) %>% 
#   .[-160,] %>% 
#   dplyr::filter(Q30 != "NA") %>% 
#   dplyr::add_row(Q30 = c(
#     "https://deims.org/664177a4-a21a-4f59-9601-00909e275868",
#     "https://deims.org/5a38fc08-5257-4b13-8465-1d50ea166b95",
#     "https://deims.org/96ba6c55-a555-4e96-a3e6-14d6dfe8785b",
#     "https://deims.org/923cb154-83c9-444d-817a-cde7879c09b5"
#   )) %>% 
#   unique() # 84 DEIMS.iD
# sitesOnSurvey <- listOfAllSites[listOfAllSites$uri %in% siteWithDeimsId$Q30, ] # 84 sites compared with ILTER formal sites

# collect biogeographical region and biome from DEIMS site
# sitesOnSurveyEnvChar <- lapply(
#   as.list(sitesOnSurvey$uri),
#   FUN = function(x) {ReLTER::get_site_info(x, category = c("EnvCharacts"))}
# ) %>% 
#   dplyr::bind_rows() %>% 
#   dplyr::select(uri, envCharacteristics.biogeographicalRegion, envCharacteristics.biome)
  
# sitesOnSurvey_2 <- merge(x = sitesOnSurvey, y = sitesOnSurveyEnvChar, by.x = "uri", by.y = "uri", all = T)
# saveRDS(sitesOnSurvey_2, file = "sitesOnSurvey_2.rds")
sitesOnSurvey_2 <- readRDS(file = "sitesOnSurvey_2.rds")
biomeNum <- sitesOnSurvey_2$envCharacteristics.biome[-68] %>% unique() %>% length()
getPalette <- grDevices::colorRampPalette(RColorBrewer::brewer.pal(12, "Set3"))

# Biome map plot
library("rnaturalearth")
library("rnaturalearthdata")

world <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")
ggplot2::ggplot(data = world) +
  ggplot2::geom_sf() +
  ggplot2::xlab("Longitude") + ggplot2::ylab("Latitude") +
  ggplot2::scale_y_continuous(limits = c(-90, 90), expand = c(0, 0)) +
  ggplot2::scale_x_continuous(expand = c(0, 0)) +
  ggplot2::geom_sf(
    data = sitesOnSurvey_2$geometry[-68], 
    size = 1, 
    ggplot2::aes(
      color = sitesOnSurvey_2$envCharacteristics.biome[-68]
    ),
  ) + # feature 68 missing the information in DEIMS_SDR about the Biome
  ggplot2::scale_fill_manual(getPalette(biomeNum)) +
  ggplot2::ggtitle("iLTER Sites on survey") +
  ggplot2::scale_fill_discrete(name = "New Legend Title")

by biogeographic distribution

Show code
# Biogeographical Region map plot
nc <- sf::st_read("../TeaBagCatalogue/Maps_export/Zonobiome_poly.shp", quiet = TRUE)
ggplot2::ggplot() +
  ggplot2::scale_y_continuous(limits = c(-90, 90), expand = c(0, 0)) +
  ggplot2::scale_x_continuous(expand = c(0, 0)) +
  ggplot2::geom_sf(data = nc, ggplot2::aes(fill = Legend), lwd = 0) +
  ggplot2::geom_sf(data = sitesOnSurvey_2$geometry[-68], color = "black", size = 1) +
  ggplot2::scale_fill_discrete(name = "Biogeographical Region")

Acknowledgments

We acknowledge the ILTER Coordination Committee and Secretariat for contributing to sharing widely the survey within the whole network and all the respondents of the ILTER community for their availability to participate.

All the analysis are performed with R language (R Core Team 2020).

This article is created by distill R package (Dervieux et al. 2022), the tables summary are made with gtsummary R package (Sjoberg et al. 2021), the plots with ggplot2 (Wickham 2016) and sjPlot (Lüdecke 2022), the maps are created by leaflet (Cheng, Karambelkar, and Xie 2022), leaflet.extras (Karambelkar and Schloerke 2018) and rnaturalearth (South 2022) R packages, while tidyr (Wickham and Girlich 2022), dplyr (Wickham et al. 2022), tm - Text Mining (Feinerer, Hornik, and Meyer 2008), wordcloud (Fellows 2018) and stringr (Wickham 2022) are used for manipulating the data for analysis. The ReLTER (Oggioni et al. 2022) was used to access information on ILTER sites in DEIMS-SDR (Wohner et al. 2019).

Bergami, Caterina, Cathlyn Merritt Davis, Alessandro Campanaro, Alessandra Pugnetti, Alba L’Astorina, and Alessandro Oggioni. 2022. Survey dataset - Environmental Citizen Science: practices and scientists’ attitudes at ILTER.” Zenodo. https://doi.org/10.5281/zenodo.7148597.
Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2022. Leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.
Commission, European, Directorate-General for Research, and Innovation. 2021. Horizon Europe, Open Science : Early Knowledge and Data Sharing, and Open Collaboration. Publications Office of the European Union. https://doi.org/doi/10.2777/18252.
Dervieux, Christophe, JJ Allaire, Rich Iannone, Alison Presmanes Hill, and Yihui Xie. 2022. Distill: R Markdown’ Format for Scientific and Technical Writing.
Feinerer, Ingo, Kurt Hornik, and David Meyer. 2008. “Text Mining Infrastructure in r.” Journal of Statistical Software 25 (5): 1–54. https://www.jstatsoft.org/v25/i05/.
Fellows, Ian. 2018. Wordcloud: Word Clouds. https://CRAN.R-project.org/package=wordcloud.
Karambelkar, Bhaskar, and Barret Schloerke. 2018. Leaflet.extras: Extra Functionality for ’Leaflet’ Package. https://CRAN.R-project.org/package=leaflet.extras.
Lüdecke, Daniel. 2022. sjPlot: Data Visualization for Statistics in Social Science. https://CRAN.R-project.org/package=sjPlot.
Oggioni, Alessandro, and Caterina Bergami. 2022. Statistical Analysis for Exploring Environmental Citizen Science Practices and Scientists’ Attitudes at ILTER (version 1.0). Zenodo. https://doi.org/10.5281/zenodo.7472886.
Oggioni, Alessandro, Micha Silver, Luigi Ranghetti, and Paolo Tagliolato. 2022. Ropensci/ReLTER: ReLTER V1.1.0 (version 1.1.0). Zenodo. https://doi.org/10.5281/zenodo.5576813.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Larmarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13: 570–80. https://doi.org/10.32614/RJ-2021-053.
South, Andy. 2022. Rnaturalearth: World Map Data from Natural Earth. https://docs.ropensci.org/rnaturalearth (website) https://github.com/ropensci/rnaturalearth.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2022. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2022. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Maximilian Girlich. 2022. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
Wohner, Christoph, Johannes Peterseil, Dimitris Poursanidis, Tomáš Kliment, Mike Wilson, Michael Mirtl, and Nektarios Chrysoulakis. 2019. “DEIMS-SDR – a Web Portal to Document Research Sites and Their Associated Data.” Ecological Informatics 51: 15–24. https://doi.org/10.1016/j.ecoinf.2019.01.005.

References

Citation

For attribution, please cite this work as

Oggioni & Bergami (2022, Oct. 10). Statistical analysis for exploring environmental Citizen Science practices and scientists' attitudes at ILTER. Retrieved from https://oggioniale.github.io/CSSurveyAnalysis/

BibTeX citation

@misc{oggioniCSI2022,
  author = {Oggioni, Alessandro and Bergami, Caterina},
  title = {Statistical analysis for exploring environmental Citizen Science practices and scientists' attitudes at ILTER},
  url = {https://oggioniale.github.io/CSSurveyAnalysis/},
  year = {2022}
}